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Executive  Summary 


In  1999,  the  second  year  of  the  MCAS,  scores  for  4th-grade  students  improved  from  the  prior 
year.  The  average  score  on  English  Language  Arts  (ELA)  rose  by  4  percentiles.  Scores  rose 
broadly  -  both  near  the  top  and  near  the  bottom,  and  also  across  ethnic  groups.  This  report 
explores  these  scores  in  some  detail,  to  see  if  this  early  rise  indicates  some  fundamental 
improvement  or  merely  superficial  fluctuation. 

The  report  investigates  several  possible  reasons  for  the  rise.  The  evidence  goes  against  those 
explanations  that  imply  the  improvement  was  superficial.  By  contrast,  there  is  evidence  in  favor 
of  deeper  improvement  that  goes  beyond  4th-grade  instruction  to  3rd-grade  as  well.  Specifically: 

•  The  MCAS  scores  are  reliable  and  valid.  Several  studies  indicate  the  exam  is  reliable;  a 
student's  performance  would  be  consistent  over  time  or  on  different  versions  of  the  MCAS. 
Also,  student  performance  as  measured  by  the  MCAS  is  correlated  to  scores  on  other  widely 
recognized  tests;  the  test  provides  a  valid  measure  of  student  performance. 

•  The  improvement  was  not  due  to  an  easier  exam  in  1999.  The  MCAS  questions  are  pilot 
tested  each  year,  and  scores  are  scaled  to  adjust  for  any  change  in  the  difficulty  of  test  items. 

•  The  improvement  was  not  due  to  a  change  in  the  characteristics  of  students  taking  the 
exam  in  1999.  On  any  test,  year-to-year  fluctuations  in  the  characteristics  of  the  test-taking 
cohort  can  affect  scores.  However,  the  rise  in  the  1999  MCAS  was  not  caused  by  changes  in 
the  students  taking  the  exam.  The  1999  cohort  had  worse  3rd-grade  scores  on  the  Iowa  Test 
of  Basic  Skills  than  their  predecessors,  yet  they  scored  higher  on  the  4th-grade  MCAS. 

•  Student  learning  is  likely  to  have  increased  more  than  the  MCAS  scores  indicate. 

Because  the  students  began  with  lower  3rd-grade  scores,  the  increase  in  MCAS  scores 
understates  the  improvement  students  made  between  1998  and  1999.  Adjusting  for  the  lower 
3rd-grade  scores,  the  average  4th-grade  score  improved  by  over  5  percentiles. 

•  It  is  unlikely  that  the  improvement  was  due  to  teachers  'teaching  to"  the  test  in  any 
narrow  sense.  MCAS'  reading  selections  and  its  heavy  writing  and  open-ended  components 
seem  to  require  deeper  teaching  improvement.  At  the  same  time  that  MCAS  scores  rose,  so 
did  reading  scores  for  3rd-graders.  This  contemporaneous  improvement  on  a  different  exam 
suggests  that  student  achievement  is  increasing  for  reasons  beyond  the  superficial.  The 
improvement  in  performance  reaches  back  into  3rd-grade  and  may  signify  a  broader  advance. 

•  Variation  across  schools  provides  further  evidence  that  the  rise  in  4th-grade  MCAS 
scores  was  driven  by  fundamental  improvement.  Those  schools  that  exceeded 
expectations  in  raising  4th-grade  scores  were  21%  more  likely  to  have  also  done  so  with  3rd- 
grade  scores.  That  is,  if  the  stimulus  of  MCAS  led  some  schools  to  more  effective  4th-grade 
instruction,  they  also  seem  to  have  improved  more  generally. 
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•  The  evidence  predicts  further  improvement  in  2000.  The  3r  -grade  improvement 
observed  in  1999  would  predict  a  further  rise  in  ^-grade  scores  for  2000  (to  be  released  later 
this  fall),  even  if  there  is  no  further  instructional  improvement,  but  especially  if  there  is. 

As  always,  some  caveats  are  in  order.  The  analysis  is  limited  to  ELA-4,  because  this  was  the 
only  MCAS  test  with  data  linked  to  a  prior  test.  The  size  of  the  improvement  should  not  be 
exaggerated,  and  two  years  do  not  make  a  trend.  But  the  evidence  does  indicate  early  signs  of 
cross-grade  improvement  in  the  crucial  literacy  skills.  This  is  exactly  the  response  that  MCAS 
was  designed  to  stimulate. 
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MCAS  and  the  Rise  of  Literacy  Skills  in  the  Early  Grades,  1998-1999 


Introduction 

In  1999,  the  second  year  of  the  MCAS,  scores  for  4th-grade  students  improved  from  the  prior 
year.  This  report  explores  the  4th-grade  English  Language  Arts  (ELA)  scores  in  some  detail,  to 
see  if  this  early  rise  indicates  some  fundamental  improvement  or  merely  superficial  fluctuation. 

We  investigate  several  possible  reasons  for  the  increase  in  scores.  The  analysis  is  based  on  a 
data  set  developed  by  the  Department  of  Education  (DOE)  that  links  4th-grade  scores  to  the  same 
students'  3rd-grade  reading  scores  on  the  Iowa  Test  of  Basic  Skills  (ITBS).  This  micro-data 
analysis  provides  additional  insight  that  complements  previous  analyses,  based  on  MCAS 
averages  at  the  school  or  district  level.1  But  it  is  still  only  an  early  step  in  the  analysis  of  MCAS 
results.  Two  years  do  not  make  a  trend.  Also,  such  analyses  are  not  yet  feasible  on  the  other 
MCAS  tests,  due  to  the  lack  of  linked  data.  Over  time,  DOE  will  assemble  a  longitudinal  data 
set  that  follows  students'  performance  on  MCAS  tests  throughout  their  careers  in  school.  When 
they  become  available,  these  data  will  allow  more  comprehensive  examination  of  the  effects  of 
the  MCAS,  including  8th  and  10th  grade  scores.  In  the  meantime,  the  more  limited  data  set 
analyzed  here  sheds  light  on  the  early  improvement  in  the  crucial  literacy  skills  measured  by 
MCAS  4th-grade  ELA  scores  and  ITBS  3rd-grade  reading  scores. 


Chart  1:  Percentile  Improvement  in  MCAS  Grade  4  ELA  Scores  in  1999 
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Approximately  75,000  students  took  the  MCAS  in  both  1998  and  1999.  The  average  score  rose 
by  the  equivalent  of  4  percentiles  and  the  median  score  by  4.5  percentiles.2  The  improvement 
came  throughout  the  distribution  of  scores  -  students  near  both  the  top  and  bottom  improved. 
Chart  1  above  illustrates  the  change  in  test  scores  for  students  at  the  mean,  as  well  as  at  the  25th, 


Robert  D.  Gaudet,  "Effective  School  Districts  in  Massachusetts:  A  Study  of  Student  Performance  on  the  1999 
MCAS  Assessments,"  University  of  Massachusetts  Donahue  Institute,  March  2000  and  Robert  D.  Gaudet, 
"Effective  School  Districts  in  Massachusetts:  A  Study  of  Student  Performance  on  the  1998  MCAS  Tests," 
University  of  Massachusetts  Donahue  Institute,  February  1999. 

The  Department  of  Education  records  unexcused  absences  as  scores  of  200.  These  relatively  few  scores  (75  in 
1998  and  9  in  1999)  are  excluded  from  all  analyses  in  this  paper,  as  were  183  students  in  1999  whose  test  status 
was  not  recorded.  Individual  scaled  MCAS  scores  are  reported  to  the  nearest  even  integer,  so  the  distribution  is 
smoothed  and  all  percentiles  are  interpolated.  All  analyses  in  this  report  are  based  on  scaled  scores,  with  the 
translation  to  percentiles  as  a  last  step. 
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50l  ,  and  75th  percentile  each  year.  Students  scoring  at  the  25th  percentile  in  1999  would  have 
ranked  at  about  the  29th  percentile  in  1998,  while  students  at  the  75th  percentile  score  would 
have  ranked  above  the  77th  percentile  in  1998.  At  the  same  time,  the  fraction  of  students  failing 
the  exam  fell  by  2.5%  (from  14.5%  to  12.0%)  -  1,698  fewer  students  failed  even  though  1,428 
more  students  took  the  test.  The  average  performance  rose  among  all  minority  groups,  by  about 
the  same  amount  as  white  students. 

Before  exploring  why  MCAS  scores  improved,  a  number  of  issues  about  the  test  should  be 
addressed. 


Reliability  and  Validity  of  the  MCAS 

The  MCAS  was  very  carefully  designed  to  measure  student  achievement.3  Special  care  was 
taken  to  develop  accurate  and  fair  tests.  Committees  of  educators  and  nationally  recognized 
testing  experts  met  to  identify  standards,  develop  questions,  and  ensure  the  integrity  of  the  tests. 
The  questions  were  field-tested  and  reviewed  for  cultural  bias,  and  only  after  repeated  review 
and  revision  were  the  final  questions  chosen  for  the  exam.  (Examples  of  essay  questions  from 
the  exam  are  given  below.) 

One  area  of  specific  attention  was  reliability.  Reliability  refers  to  the  extent  that  a  student's 
performance  on  an  exam  would  be  consistent  over  time  or  on  different  versions  of  the  test.  If  the 
MCAS  were  not  a  reliable  test,  then  the  scores  each  year  would  have  little  meaning.  In  addition 
to  the  care  taken  to  develop  the  tests,  the  Department  of  Education  rigorously  tested  the 
reliability  of  the  MCAS  after  the  initial  test  in  1998.4  They  found  that  the  MCAS  was  as  reliable 
as  the  Advanced  Placement  examinations  and  the  Stanford-9  exam  that  the  State  of  California 
uses  for  its  recently  implemented  Standardized  Testing  and  Reporting  (STAR)  program. 

A  useful  test  must  be  valid  as  well  as  reliable.  That  is,  it  must  measure  what  it  was  designed  to 
measure.  Again,  during  the  development  of  the  MCAS,  considerable  time  and  effort  were  spent 
to  ensure  that  the  test  would  assess  the  knowledge  and  skills  described  in  the  Massachusetts 
Curriculum  Frameworks.  Before  the  test  development  process  began,  curriculum  specialists, 
content  area  experts,  and  committees  of  Massachusetts  public  school  teachers  determined  which 
of  the  standards  in  the  frameworks  could  be  measured  by  MCAS  at  each  grade  level.  They  also 
decided  what  percentage  of  the  test  would  be  dedicated  to  each  standard.  Based  on  these 
decisions,  a  test  blueprint  was  created  to  guide  the  development  and  selection  of  test  items. 
During  the  development  process,  test  questions  were  reviewed  and  piloted  to  ensure  the  validity 
of  the  MCAS. 


For  an  explanation  of  the  procedures  used  to  design  the  exams,  see  Appendix  A  of  the  DOE's  "Guide  to 
Interpreting  the  1998  MCAS  School  and  District  Reports,"  available  on  the  internet  at 
ftp://www.doe.mass.edu/pub/pdf/mcas/interpguide.pdf. 

For  a  complete  explanation  of  the  reliability  and  validity  testing,  see  "1998  MCAS  Technical  Report 
Summary,"  1999,  Commonwealth  of  Massachusetts  Department  of  Education,  available  on  the  internet  at 
www.doe.mass.edu/mcas/archives/99/98tech_report.pdf. 
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To  verify  the  validity  of  the  MCAS,  the  Department  of  Education  commissioned  two  external 
studies  to  examine  the  1998  results.5  The  studies  compared  MCAS  scores  for  students  in  two 
large  districts  to  other  indicators  of  their  performance.  If  the  MCAS  were  not  assessing  student 
performance,  the  scores  would  have  little  relationship  to  other  measures  of  performance.  Both 
studies  found  that  the  MCAS  was  valid,  i.e.  it  was  accurately  measuring  student  achievement. 
Students  who  performed  well  on  the  MCAS  also  scored  well  on  the  Stanford  Achievement  Test 
(Stanford-9)  and  the  Metropolitan  Achievement  Test  (MAT-7).  Finally,  the  two  studies  found  a 
strong  relationship  between  a  student's  MCAS  scores  and  the  courses  the  student  reported 
taking.  Students  who  took  higher-level  courses  tended  to  perform  better  on  the  MCAS, 
especially  in  mathematics. 

While  the  two  external  studies  only  examined  performance  for  students  in  a  single  district,  there 
are  also  broader  measures  of  the  validity  of  the  MCAS.  In  math  and  science,  the  fraction  of 
students  reaching  each  of  the  MCAS  performance  levels  in  1998  corresponds  closely  with 
Massachusetts' s  performance  on  the  National  Assessment  of  Educational  Progress  (NAEP)  in 
1996.6 

The  DOE  also  compared  MCAS  ELA  scores  for  over  50,000  4th-grade  students  with  their 
reading  scores  the  previous  year  on  the  ITBS  and  found  a  strong  relationship.  As  the  DOE's 
summary  report  notes,  "The  correlation  of  0.75  (between  MCAS  and  ITBS  scores)  suggests  that 
the  two  tests  are  measuring  the  same  general  construct  (i.e.  reading).  We  would  not  expect  a 
correlation  of  1.0  because,  unlike  the  ITBS,  the  MCAS  tests  are  designed  exclusively  based  on 
the  Curriculum  Frameworks,  and  include  a  variety  of  item  types,  such  as  open-response  items 
and  writing  prompts."7 

The  extent  of  agreement  between  the  MCAS  and  other  tests  and  the  correspondence  between 
coursework  and  MCAS  scores  demonstrates  that  the  MCAS  provides  a  sound  appraisal  of 
student  achievement. 


The  studies  conducted  by  Human  Resources  Research  Organization  (Arthur  A.  Thacker  and  R.  Gene  Hoffman, 
"Relationship  Between  MCAS  and  SAT-9  for  One  District  in  Massachusetts,"  1999)  and  The  National  Center 
for  the  Improvement  of  Educational  Assessment,  Inc.  (Brian  Gong,  "Relationships  Between  Student 
Performance  on  the  MCAS  and  Other  Tests  -  Collaborating  District  A,"  Grades  4  and  10,  1999).  For  a 
summary  of  the  findings,  see  the  "1998  MCAS  Technical  Report  Summary." 

Commonwealth  of  Massachusetts  Department  of  Education,  "1998  MCAS  Technical  Report  Summary,"  1999, 
p.  12. 

Commonwealth  of  Massachusetts  Department  of  Education,  "1998  MCAS  Technical  Report  Summary,"  1999, 
p.  10. 
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Why  did  4th-grade  MCAS  ELA  scores  improve? 

The  remainder  of  this  report  explores  several  potential  explanations  for  the  improvement  in  ELA 
test  scores.  To  do  so,  we  make  further  use  of  the  DOE's  data  set  linking  an  individual's  MCAS 
English  Language  Arts  score  to  the  same  student's  3rd-grade  ITBS  Reading  score  from  the  prior 
year.  The  data  set  covers  over  2/3  of  the  test-takers.  The  students  in  the  MCAS/TTBS  linked 
sample  are  not  quite  representative  of  the  entire  population  of  students  -  both  MCAS  and  ITBS 
scores  are  higher  in  the  linked  sample.  However,  the  change  in  test  scores  from  one  year  to  the 
next  is  similar  in  the  full  sample  and  the  linked  sample:  MCAS  scores  rose  about  one  point  (on  a 
scale  of  200-270)  and  ITBS  scores  fell  by  slightly  more  than  one  point  (on  a  range  of  126-250). 

Table  1:  Mean  MCAS  scores  for  all  test-takers  &  ITBS-linked  subset 


1998  Mean  Score 
(population) 

1999  Mean  Score 
(population) 

Change  in  mean  score 

All  test-takers 

230.5 
(n=  74,376) 

231.6 
(n  =  75,804) 

1.04 

ITBS-linked 

231.7 
(n  =  53,159) 

232.6 
(n  =  51,814) 

0.85 

Table  2:  Mean  ITBS  scores  for  all  test-takers  &  ITBS-linked  subset 


1997  Mean  Score 
(population) 

1998  Mean  Score 
(population) 

Change  in  mean  score 

All  test-takers 

193.0 
(n  =  71,473) 

191.7 
(n  =  74,024) 

-1.26 

ITBS-linked 

194.1 
(n  =  53,159) 

193.0 
(n  =  51,814) 

-1.04 

The  improvement  in  MCAS  scores  noted  above  could  have  come  about  for  a  variety  of  reasons. 
Perhaps  the  simplest  reason  would  be  if  the  1999  exam  were  easier  than  the  1998  exam  (or  if  it 
were  scored  more  leniently  or  graded  on  an  easier  scale).  This  is  not  the  case.  The  tests  are 
designed  and  written  to  make  the  scores  directly  comparable  across  years. 

Each  year  when  students  take  the  MCAS,  the  exams  contain  questions  that  do  not  count  towards 
the  score.  These  are  potential  questions  for  the  following  year's  exam.  Including  the  questions 
on  the  exam  allows  the  test  designers  to  compare  student  responses  and  scores  on  the  actual  test 
questions  to  their  responses  to  the  pilot  test  questions  to  gauge  the  difficulty  of  the  potential 
questions.  The  Department  of  Education  is  then  able  to  scale  each  year's  scores  to  ensure  that 
the  difficulty  of  reaching  any  score  on  the  MCAS  is  consistent  from  year  to  year.9  For  example, 
the  scaled  score  to  achieve  a  passing  grade  is  220.  To  reach  that  score  in  1998  required  a  raw 
score  of  24  out  of  67  possible  points,  or  35.8%.  For  1999,  passing  required  a  raw  score  of  29 
out  of  71,  or  40.8%.  Thus  changes  in  item  difficulty  were  offset  by  changes  in  the  grading  scale. 


These  "matrix-sampled"  questions,  which  comprise  approximately  twenty  percent  of  the  total,  differ  across  the 
twelve  forms  of  each  test.  Individual  scores  are  based  only  on  the  common  questions. 

This  process  is  called  equating.    For  an  explanation  of  the  computation  of  scaled  scores,  see  "MCAS  1998 
Technical  Report,"  available  on  the  internet  at  www.doe.mass.edu/mcas/archives/99/98tech_report_full.pdf. 
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Although  we  can  rule  out  an  easier  test  as  an  explanation  for  the  higher  scores,  there  are  other 
possibilities  to  explore  before  concluding  that  the  rise  was  indicative  of  fundamental  change. 
One  possible  explanation  for  the  improvement  in  test  scores  is  that  perhaps  the  students  were 
simply  "better"  in  1999  than  they  were  in  1998.  Education  is  cumulative;  far  and  away  the  most 
important  predictor  of  educational  achievement  is  prior  educational  achievement.  Put  simply, 
scores  may  have  improved  in  1999  not  because  the  schools  taught  more  but  instead  because  the 
4th-grade  class  of  1999  was  more  advanced  than  the  class  ahead  of  them. 


Chart  2:  Percentile  Change  in  1998  ITBS  Grade  3  Reading  Scores 
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In  fact,  the  opposite  is  true.  As  Chart  2  shows,  the  cohort  of  students  who  took  the  MCAS  in 
1999  had  lower  3rd-grade  ITBS  scores  in  1998  than  its  predecessor.  Students  near  the  top, 
middle,  and  bottom  of  the  score  distribution  did  worse  on  the  ITBS  in  1998  than  their  older  peers 
had  in  1997.  This  means  that  the  improvement  in  1999  MCAS  scores  cannot  be  ascribed  to 
cohort  quality.10  If  anything,  the  1998  ITBS  scores  would  have  suggested  that  MCAS  scores 
would  fall  in  1999.  The  fact  that  they  rose,  despite  the  drop  in  the  ITBS  scores,  implies  that 
student  performance  improved  more  than  the  raw  MCAS  scores  would  indicate. 

One  way  to  consider  the  true  change  in  student  achievement  is  to  ask  what  the  1999  MCAS 
scores  would  have  been  had  the  students  begun  the  year  with  the  same  ITBS  scores  and 
characteristics  as  their  predecessors.  The  4th-graders  in  1999  scored  approximately  one  point 
lower  than  the  class  ahead  of  them  on  the  ITBS.  If  they  had  started  at  the  same  level,  they  would 
likely  have  scored  even  better  on  the  MCAS  than  they  actually  did.  In  both  1998  and  1999,  for 
every  extra  point  a  student  earned  on  the  3rd-grade  ITBS  Reading  exam,  his  or  her  expected 
MCAS  ELA  scaled  score  was  0.38  points  higher.11 


10 


11 


This  holds  for  black  and  Hispanic  students,  as  well  as  white  students.  The  rise  in  Asian  students'  MCAS  scores, 
however,  reflects  in  part  higher  ITBS  scores. 

This  estimate  and  those  in  the  remainder  of  the  report  refer  to  results  calculated  using  regression  analysis  of  the 
data.  A  detailed  description  of  the  procedures  and  results  is  available  in  the  Technical  Appendix. 
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In  Chart  3  below,  the  1999  Adjusted  Score  illustrates  what  the  performance  of  the  1999  students 
would  have  been  had  they  had  the  same  characteristics  as  the  1998  students.12  For  example,  the 
median  MCAS  score  in  1999  would  have  ranked  at  about  the  54l  percentile  in  1998.  However, 
if  the  4th-graders'  UBS  scores  had  not  been  lower,  the  median  MCAS  score  in  1999  would  have 
improved  an  additional  1.2  percentiles.  Similarly,  the  mean  MCAS  score  rose  over  5  percentiles, 
after  adjusting  for  lower  ITBS  scores.13  As  the  chart  illustrates,  the  underlying  improvement  in 
student  performance  is  likely  to  have  been  larger  than  the  rise  in  MCAS  scores  indicates. 


Chart  3:  Percentile  Change  in  1999  Actual  and  Adjusted  MCAS  Grade  4  ELA  Scores 
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One  important  concern  about  the  rise  in  1999  MCAS  scores  is  that  perhaps  it  does  not  represent 
true  progress  in  student  achievement.  If  a  test  is  overly  mechanical  and  easily  coached,  scores 
can  rise  without  any  real  improvement  in  learning.  There  have,  for  example,  been  instances 
where  test  scores  rose  upon  the  introduction  of  a  new  test  at  the  same  time  that  scores  fell  on  the 
test  that  had  previously  been  required.14  That  is,  a  new  test  may  be  so  superficial  that  "teaching 
to  the  test"  raises  scores  but  does  not  result  in  broader  improvements  in  learning. 

MCAS  is  less  vulnerable  to  such  score  inflation  for  at  least  two  reasons.  First,  the  MCAS  test 
questions  on  which  student  scores  are  based  are  all  new  each  year,  so  there  is  no  inflation  simply 


12 


13 


14 


The  numbers  for  the  1999  scores  are  slightly  different  than  those  in  Chart  1  because  Chart  3  uses  only  the  linked 
sample.  As  explained  further  in  the  Technical  Appendix,  the  adjusted  scores  were  calculated  using  the  1998 
student  characteristics  and  the  estimated  coefficients  from  a  regression  of  1999  MCAS  scores  on  student 
characteristics  and  school  fixed  effects. 

Since  the  mean  MCAS  scores  rose  less  in  the  linked  sample  (3.2  percentiles,  in  Chart  3),  than  in  the  full 
population  (4.0  percentiles,  in  Chart  1),  the  5.1  percentile  adjusted  score  improvement  is  likely  an 
underestimate.  Extrapolating  from  the  linked  sample  to  the  full  population  gives  an  adjusted  rise  of  about  6 
percentiles. 

Daniel  Koretz,  "Using  Student  Assessments  for  Educational  Accountability,"  in  Eric  A.  Hanushek  and  Dale  W. 
Jorgenson,  eds.,  Improving  America's  Schools:  The  Role  of  Incentives  (Washington,  D.C.:  National  Academy 
Press,  1996). 


6      Policy  Report  No.  6  -  October  2000 


MCAS  and  the  Rise  of  Literacy  Skills  in  the  Early  Grades,  1998-1999 


by  virtue  of  students  and  teachers  learning  the  questions  that  appear  year  after  year.  Second, 
MCAS  is  designed  to  encourage  deeper  changes  in  instruction  aimed  at  higher-order  skills.  The 
open-ended  questions  require  coherent  and  well-structured  essays  or  problem  solving 
mathematical  skills. 

For  example,  the  "long  composition"  component  of  the  ELA  exam  accounts  for  about  30%  of  the 
total  points  in  the  raw  scores.  It  is  administered  over  two  test  sessions  on  the  same  day, 
separated  by  a  short  break,  to  allow  for  drafting  and  redrafting,  with  the  opportunity  to  use  a 
dictionary.  Students  are  granted  extra  time  upon  request.  They  are  graded  according  to  well- 
specified  rubrics  that  are  taught  to  teachers  engaged  in  scoring  over  the  summer.15  The  4th-grade 
compositions  for  1998  and  1999,  respectively,  were:16 

"Write  a  LETTER  to  a  third-grader  about  your  experience  in  the  fourth  grade. 
Describe  fourth  grade  so  that  someone  who  will  be  in  fourth  grade  next  year  will 
know  what  to  expect." 

"Some  days  are  more  fun  than  others.  Describe  a  day  that  was  great  for  you  and 
tell  WHY  it  was  great.  Include  details  so  the  reader  can  enjoy  the  day  as  much  as 
you  did." 

Anecdotal  evidence  indicates  that  instruction  is  changing  in  many  schools  to  stress  precisely  the 
higher-order  skills  required  for  composition  and  open-ended  questions.  If  so,  the  results  may  not 
show  up  as  immediate  rapid  increases  in  test  scores,  especially  if  the  skills  need  to  be  developed 
in  earlier  grades  and  built  upon  in  the  later  grades  where  MCAS  tests  are  administered. 

Even  with  the  design  of  the  MCAS,  it  is  impossible  to  rule  out  that  test  scores  could  change  as 
teachers  "teach  to"  the  test.  However,  as  Chart  4  illustrates,  in  1999,  at  the  same  time  that  the 
MCAS  4th-grade  scores  rose,  the  ITBS  3rd-grade  reading  scores  also  rose.  The 
contemporaneous  improvement  in  the  performance  of  different  groups  of  students  on  both  the 
MCAS  and  the  ITBS  suggests  that  the  improvement  in  the  MCAS  may  not  be  the  superficial 
type  feared.  If  schools  were  teaching  only  to  the  MCAS,  the  ITBS  scores  would  not  be  expected 
to  increase  at  the  same  time. 


To  gauge  the  reasonableness  of  the  scoring  standards,  see  examples  of  actual  student  compositions  scored  at 


16 


different  levels  of  proficiency,  at  www.doe.mass.edu/mcas/student/1999/. 

The  composition  for  2000  was  'The  four  seasons  of  the  year  are  spring,  summer,  fall  and  winter.   Choose  ONE 
season  and  describe  what  you  like  to  do  during  that  season." 
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Chart  4:  Percentile  Change  in  1999  ITBS  Grade  3  Reading  Scores 
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Of  course,  the  1999  ITBS  scores  could  have  risen  due  to  a  cohort  effect  that  was  coincidental 
with  the  increase  in  1999  MCAS.  However,  once  again  the  cohort  effect  does  not  account  for 
the  difference  in  test  scores.  The  1999  class  of  3rd-grade  students  taking  the  ITBS  had 
demographics  that  would  have  predicted  lower  scores  than  the  1998  class  (more  low  income  and 
non-English  speaking  members),  yet  their  scores  improved  nevertheless,  overall.17 

The  rise  in  ITBS  scores  in  1999  thus  appears  to  have  been  genuine.18  The  fact  that  it  occurred 
during  the  second  year  of  the  MCAS  could  indicate  that  the  MCAS  program  stimulated 
improvement  in  schools  beyond  the  MCAS  itself,  improvement  that  spilled  over  into  3rd-grade. 
This  of  course  is  the  objective  of  the  MCAS  program.  Measuring  achievement  and 
implementing  high  standards  are  meant  to  increase  performance  throughout  the  school  system. 
The  evidence  that  the  MCAS  is  beginning  to  have  this  effect  will  be  explored  in  more  detail 
below. 
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At  the  lower  percentiles,  scores  did  not  improve.   This  may  be  at  least  partially  explained  by  the  likelihood  that 

the  adverse  cohort  effect  was  most  concentrated  in  the  lower  percentiles,  due  to  a  change  in  the  regulations 

expanding  the  number  of  LEP  students  required  to  take  the  ITBS.     In  1997  and  1998,  LEP  students  were 

required  to  participate  in  the  ITBS  only  if  they  were  going  to  be  recommended  for  regular  education  the 

following  year.    In  1999,  all  3rd-grade  LEP  students  who  had  attended  school  in  the  US  since  grade  1  were 

required  to  take  the  exam.  This  change  appears  to  have  resulted  in  test  scores  for  about  2,000  LEP  students  who 

were  not  being  recommended  for  regular  education,  students  who  would  have  not  been  tested  under  the  old 

rules.    This  means  the  lower  end  of  the  1999  ITBS  distribution  was  measuring  a  more  disadvantaged  group  of 

test-takers.   (The  school  and  district  reports  excluded  these  students,  to  facilitate  comparability  across  years,  as 

did  the  DOE's  summary  report,  but  Chart  4  includes  them.) 
• 

ITBS  scores  improved  across  all  minority  groups.  The  rise  in  scores  was  smallest  for  Hispanic  students,  but  this 

may  have  been  in  part  due  to  the  change  in  regulations  covering  LEP  students. 
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School  Performance  on  MCAS  and  ITBS 

The  linked  MCAS/ITBS  data  set  allows  us  to  draw  further  insights  into  how  schools  responded 
to  the  MCAS.  Evaluating  a  school's  performance  in  any  year  simply  by  looking  at  the  students' 
average  scores  may  not  provide  a  fair  comparison.  As  discussed  above,  some  students  begin  4th 
grade  with  fewer  skills,  as  measured,  for  example,  by  ITBS  scores.  Suppose  the  3rd-grade 
students  in  one  school  had  scored  very  poorly  on  the  ITBS  test  and  those  in  another  school  had 
scored  extremely  well.  If  both  groups  of  students  passed  the  MCAS  in  4th-grade,  only  the  first 
school  might  be  considered  notably  successful  in  4th-grade  instruction. 

Keeping  this  point  in  mind,  we  calculate  a  measure  of  the  4th-grade  performance  of  each  school 
in  Massachusetts  for  both  1998  and  1999.  These  "school  effects"  represent  whether  the  students 
in  a  school,  given  their  ITBS  scores,  scored  well  on  the  MCAS.19  To  a  certain  extent  the  school 
effects  measure  the  average  improvement  in  student  achievement  during  4th-grade  at  each 
school.  A  large  school  effect  is  not  the  same  as  saying  that  MCAS  scores  are  high  in  a  school. 
Two  schools  could  have  the  same  MCAS  score,  but  the  school  whose  students  began  with  lower 
ITBS  scores  is  in  some  sense  a  more  effective  school,  at  the  4th-grade  level.  The  school  effect  is 
a  measure  of  "value  added"  in  4th-grade,  albeit  not  a  perfect  measure. 

We  then  compare  the  school  effects  in  1998  and  1999  and  find  a  strong  relationship.  This  is  not 
surprising  -  schools  that  added  high  value  in  1998  also  tended  to  do  so  in  1999.  However,  some 
schools  did  better  in  1999  than  would  have  been  predicted  from  their  1998  performance.20  The 
point  of  particular  interest  here  is  the  relationship  between  a  school's  tendency  to  exceed 
expectations  on  MCAS  and  on  ITBS. 

"School  effects"  on  ITBS  represent  whether  the  students  in  a  school  scored  well  on  the  ITBS, 
given  a  set  of  individual  demographic  and  other  characteristics.21  As  with  the  MCAS,  there  is  a 
strong  relationship  between  past  and  present  ITBS  school  effects.  But,  again,  some  schools  do 
better  than  predicted  in  3rd-grade  instruction,  by  this  measure.  The  interesting  finding  (which  is 
statistically  significant)  is  that  these  are  more  likely  to  be  the  same  schools  that  did  better  than 
predicted  in  adding  value  to  4th-grade  MCAS  scores.  Specifically,  schools  that  were  more 
effective  than  predicted  in  generating  high  MCAS  scores  in  1999  were  about  21%  more  likely  to 
have  also  exceeded  expectations  for  ITBS  3rd-grade  scores  (probability  of  0.540  vs.  0.447,  as 
depicted  in  Chart  5). 


19 


20 


21 


The  MCAS  school  effects  are  the  coefficients  from  school  indicator  variables  in  a  regression  of  MCAS  scores 
on  ITBS  scores  as  well  as  gender  and  race  controls. 

As  explained  in  the  Technical  Appendix,  this  statement  and  related  ones  below  refer  to  residuals  in  regressions 
of  1999  school  effects  on  1998  school  effects. 

The  ITBS  school  effects  are  the  coefficients  from  school  indicator  variables  in  a  regression  of  ITBS  scores  on 
race  and  ethnicity  controls,  indicators  for  Title  1  status,  limited  English  proficiency  (LEP),  and  special 
education  status,  as  well  as  interaction  terms.  See  the  Technical  Appendix  for  the  regression  results. 
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Chart  5:  Probability  that  Schools  Raised  ITBS  Scores  More  Than  Predicted 
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It  is  important  to  be  clear  that  this  result  is  not  simply  the  commonplace  result  that  schools  with 
high  ITBS  scores  tend  also  to  have  high  MCAS  scores  from  year  to  year.  This  result  says 
something  entirely  different:  schools  that  tended  to  be  more  effective  than  expected  in  1999  for 
their  4th-grade  students  also  tended  to  be  more  effective  than  expected  for  their  3rd-grade 
students.  This  finding  reinforces  the  point  that  the  MCAS  scores  do  not  represent  teaching  to  the 
test  in  any  test-specific  superficial  sense,  since  these  students  are  taking  different  exams  in 
different  grades.  It  also  cannot  be  explained  by  a  cohort  effect,  because  different  students  took 
each  exam  in  1999.  Some  schools  apparently  were  able  to  improve  in  their  teaching  of  both 
groups  of  students.22 

This  relationship  is  consistent  with  the  hypothesis  that  those  schools  that  were  stimulated  most  to 
action  by  the  introduction  of  MCAS  made  improvements  in  3rd-grade  reading  instruction  as  well 
as  4th-grade  reading  and  writing.  If  so,  this  would  indicate  that  the  positive  effects  of  MCAS  go 
beyond  superficial  test  coaching  to  more  pervasive  improvements  which  reach  back  to  earlier 
grades.  When  the  2000  MCAS  reports  are  available  this  fall  we  will  learn  more,  but  based  on  the 
preliminary  evidence  presented  here,  the  MCAS  may  be  exerting  a  positive  effect  on  school 
performance  in  the  early  grades. 


22 


We  were  not  able  to  discern  any  obvious  patterns  in  the  characteristics  of  such  schools,  based  on  the  data  set  at 
hand.  For  example,  the  correlation  with  average  MCAS  scores  was  quite  low.  Schools  in  urban  districts  were 
prominently  represented  among  both  the  schools  that  most  exceeded  expectations  and  those  that  most  fell  short. 
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Conclusion 

The  MCAS  was  implemented  as  part  of  the  Education  Reform  Act  of  1993  with  the  ultimate 
purpose  of  improving  the  performance  of  students  in  Massachusetts.  It  works  toward  this  goal 
on  several  fronts:  it  provides  feedback  to  allow  parents,  teachers,  and  administrators  to  monitor 
student  achievement;  it  increases  district  and  school  accountability;  and,  beginning  with  the  class 
of  2003,  students  will  be  required  to  pass  the  English  Language  Arts  and  Mathematics  exams  to 
be  eligible  for  a  high  school  diploma. 

It  is  still  early  to  determine  conclusively  how  the  introduction  of  the  MCAS  has  affected  the 
school  system.  Nevertheless,  this  review  of  the  2nd  year  MCAS  English  Language  Arts  results 
for  grade  4  demonstrates  several  important  points.  The  increase  in  scores  appears  to  be 
indicative  of  a  real  improvement  in  the  effectiveness  of  schools.  We  examined  several  other 
possible  causes  for  the  improvement  in  scores,  and  these  alternative  explanations  do  not  hold  up. 
The  improvement  cannot  be  ascribed  to  changes  in  the  student  characteristics  during  the  second 
year  of  the  MCAS.  The  4th-grade  students  who  took  the  MCAS  English  Language  Assessment 
in  1999  began  the  year  with  lower  test  scores  than  the  4th-grade  class  the  previous  year,  yet  the 
1999  class  posted  higher  MCAS  scores.  The  improvement  in  MCAS  scores  therefore  probably 
understates  the  underlying  improvement  in  4th-grade  student  achievement  between  1998  and 
1999. 

Moreover,  the  fact  that  ITBS  scores  rose  strongly  in  1999,  contemporaneous  with  the  rise  in 
MCAS  scores,  suggests  that  the  changes  do  not  come  from  "teaching  to"  the  MCAS  in  any 
narrow  sense.  Potentially  the  most  important  finding  in  this  study  concerns  the  effect  of  the 
MCAS  on  the  effectiveness  of  schools  at  the  earlier  grades.  We  find  intriguing  circumstantial 
evidence  that  the  improvement  in  student  achievement  may  be  reaching  back  into  3rd-grade. 
Not  only  was  there  a  broad-based  rise  in  4th-grade  MCAS  scores,  accompanied  by  an  increase  in 
3rd-grade  ITBS  scores,  but  moreover,  at  the  school  level,  these  improvements  seem  to  be  related. 
Schools  that  did  better  than  expected  in  raising  MCAS  scores  in  1999  also  tended  to  do  better 
than  expected  in  raising  3rd-grade  reading  scores. 

Finally,  if  the  improvement  in  3rd-grade  scores  reflects  a  deeper  improvement  in  school 
performance,  it  bodes  well  for  the  future.  The  cumulative  nature  of  education  means  that  schools 
may  be  able  to  build  upon  the  improvement  at  each  succeeding  grade,  so  that  the  higher  level  of 
achievement  will  continue  to  work  its  way  through  the  school  system.  The  3rd-grade 
improvement  observed  in  1999  would  predict  a  further  rise  in  4th-grade  scores  for  2000  (to  be 
released  later  this  fall),  even  if  there  is  no  further  instructional  improvement,  but  especially  if 
there  is. 

As  more  data  become  available,  the  impact  of  the  MCAS  should  and  will  be  studied  further. 
However,  the  data  available  so  far  provide  some  reason  for  optimism  about  the  future 
achievement  of  students  in  Massachusetts.  The  evidence  does  indicate  early  signs  of  cross-grade 
improvement  in  the  crucial  literacy  skills.  This  is  exactly  the  response  that  MCAS  was  designed 
to  stimulate. 
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Technical  Appendix 


Consider  regressions  for  MCAS  scores  of  the  form: 

(1)  MCASst  =  octITBSs,t-i  +  pt(race  &  gender  indicators)st  +  5t(school  indicators)t  +  Est 

for  student  s  in  time  t.  The  variation  in  3rd-grade  ITBS  scores  alone  accounts  for  56%  of  the 
variation  in  4th-grade  MCAS  scores.  Relatively  little  is  added  to  explanatory  power  by  adding 
other  controls.  Race,  ethnicity,  and  gender  add  only  about  1%.  Variables  indicating  student 
status  such  as  special  education,  limited  English  proficiency,  and  low  income  add  only  about  2%. 
Race  and  gender  were  the  only  demographic  controls  available  for  both  years,  because  the  linked 
data  set  for  1998  did  not  include  the  student  status  variables,  such  as  special  education  or  LEP 
status.  However,  excluding  these  controls  should  have  little  effect  on  our  analysis  because  the 
ITBS  score  will  capture  much  of  the  impact  of  these  variables. 

Table  3:  MCAS  Fixed  Effect  Regression  Results 


Year 

1998 

1999 

Coefficient 
(std.  error) 

Coefficient 
(std.  error) 

ITBS 

0.37 
(0.002) 

0.38 
(0.002) 

Male 

-1.61 
(0.06) 

-1.86 
(0.06) 

Race  Missing 

-2.98 
(0.23) 

-0.42 
(0.29) 

Black 

-1.06 
(0.14) 

-1.28 
(0.13) 

Asian 

2.69 
(0.17) 

2.57 
(0.15) 

Hispanic 

-0.01 
(0.15) 

-0.01 
(0.13) 

Native  American 

-1.30 
(0.25) 

-0.13 

(0.45) 

Mixed  Race 

-0.38 
(0.14) 

0.77 
(0.43) 

• 

R-squared 

0.64 

0.64 

Observations 

53,157 

51,782 

Number  of  Schools 

1,000 

994 

23 


All  regressions  in  this  report  are  by  ordinary  least  squares,  with  weights  as  appropriate  (see  below). 
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In  both  1998  and  1999,  students  with  one  point  higher  on  the  prior  year's  ITBS  score  average 
about  0.4  points  higher  on  the  MCAS.  While  the  MCAS  and  ITBS  tests  are  scored  on  different 
scales,  a  one  standard  deviation  difference  in  the  ITBS  score  results  in  roughly  a  0.7  standard 
deviation  difference  on  the  MCAS  the  following  year.  This  estimate  is  robust  to  the  inclusion  of 
demographic  controls  (which  add  little  explanatory  power)  and  school  effects. 

We  use  the  estimated  coefficients  to  separate  out  a  cohort  effect  from  the  change  in  mean  MCAS 
scores,  leaving  us  a  better  measure  of  a  second-year  effect.  The  change  in  average  MCAS  scores 
can  be  written  as: 


(2)  MCAS99  -  MCAS98  =  lg9  (X99  -  X98 )  +  X98  (kgg  -  *98 ) , 

where  Xt  is  the  vector  of  estimated  coefficients  and  Xt  is  the  vector  of  regressors.  The  cohort 
effect  is  captured  by  the  first  term  on  the  right  hand  side  of  the  equation.  It  represents  the  portion 
of  the  difference  in  average  MCAS  scores  that  is  due  to  changes  in  the  mean  values  of  the 
explanatory  variables,  i.e.  the  change  caused  by  differences  in  ITBS  scores  or  demographics. 
The  cohort  effect  is  estimated  to  be  negative  0.47  MCAS  points,  primarily  due  to  the  one  point 
drop  in  the  mean  ITBS  score.  This  indicates  that  the  actual  rise  in  mean  MCAS  scores,  0.85  for 
the  linked  population,  underestimates  by  a  rather  substantial  margin  what  might  have  occurred  if 
the  cohort's  incoming  skills  had  been  the  same  as  in  the  previous  year.  In  terms  of  percentiles, 
this  suggests  that  the  adverse  cohort  effect  masked  an  underlying  rise  in  MCAS  mean  scores  of 
5.1  percentiles,  rather  than  the  3.2  indicated  by  unadjusted  scores,  as  depicted  in  Chart  3  in  the 
body  of  this  report.25 

The  Oaxaca  decomposition  in  (2)  holds  only  at  the  mean.  To  estimate  cohort-adjusted  scores  at 
points  in  the  distribution  other  than  the  mean,  we  multiply  the  vector  of  1998  independent 
variables  (X98)  by  the  1999  estimated  coefficients  (A49)  and  add  the  residuals  from  the  1998 
regression  (estimates  of  £98).  This  produces  a  distribution  of  adjusted  scores,  which  are 
compared  with  unadjusted  scores  in  Chart  3. 

Unlike  the  MCAS  scores,  which  we  were  able  to  link  to  a  pre-test  (the  ITBS  scores  of  the 
previous  year),  we  have  no  pre-test  for  ITBS  scores.  We  regress  the  ITBS  scores  on 
demographic  variables  (race  and  gender),  student  status  (Title  1,  LEP,  and  special  education), 
and  school  indicators.  These  variables  account  for  39%  -  42%  of  the  variation  in  individual 
ITBS  scores  in  regressions  for  1997  -  1999.  This  is  substantially  lower  than  the  64%  figure  for 
the  MCAS  regressions,  due  to  the  lack  of  a  pre-test.   Consequently,  our  attempt  to  separate  out 


24 


25 


This  technique  is  known  in  the  economics  literature  as  a  Oaxaca  decompostion,  due  to  economist  Ronald 

Oaxaca.      The  difference   in  means  can  also  be  written  as    X.qo(X99  -  X98)  +  X99(A.qo  -  ^00 ) .  The 

interpretation  and  the  share  of  the  total  change  due  to  changes  in  coefficients  differs  slightly.  The  second  term 
in  equation  (2)  estimates  how  the  average  student  from  the  1998  cohort  would  have  performed  in  the  1999 
classroom,  compared  to  the  1998  classroom.  The  version  in  this  footnote  estimates  how  the  average  student 
from  the  1999  cohort  would  have  performed  in  a  1998  classroom,  compared  to  the  1999  classroom.  The  results 
are  similar  (see  footnote  25). 

Using  the  alternative  decomposition  outlined  in  the  previous  footnote  also  results  in  a  negative  cohort  effect, 
and  the  demographic-corrected  score  would  rise  4.6  percentiles  instead  of  5.1. 


14  Policy  Report  No.  6  -  October  2000 


the  cohort  effect  from  changes  in  the  ITBS  is  potentially  less  complete,  i.e.  cohort  effects  due  to 
changes  in  unobserved  variables  might  be  more  important  for  the  ITBS  analysis  than  for  the 
MCAS  analysis. 

With  that  caveat,  we  consider  the  decomposition  of  the  change  in  ITBS  scores  from  1998  -  1999. 
We  begin  by  regressing  ITBS  scores  on  the  demographic  controls  and  school  indicators: 

(3)  ITBSst  =  P  t(race,  gender,  student  status  indicators)st  +  5  t(school  indicators)t  +  est 

for  student  s  at  time  t. 

Table  4:  ITBS  Fixed  Effect  Regression  Results 


Year 

1997 

1998 

1999 

Coefficient 
(std.  error) 

Coefficient 
(std.  error) 

Coefficient 
(std.  error) 

Male 

-0.57 
(0.13) 

-0.33 
(0.12) 

-0.08 
(0.13) 

Race  Missing 

-2.03 
(0.30) 

-3.15 
(0.62) 

-3.82 
(0.71) 

Black 

-9.16 

(0.30) 

-9.51 
(0.26) 

-10.48 
(0.29) 

Asian 

-5.41 
(0.38) 

-5.72 
(0.33) 

-4.59 
(0.36) 

Hispanic 

-10.64 
(0.31) 

-10.81 

(0.27) 

-10.75 
(0.29) 

Native  American 

-5.31 
(1.20) 

-4.36 
(0.95) 

-5.50 
(1.11) 

Mixed  Race 

-3.40 
(0.56) 

-3.58 
(0.92) 

-1.23 
(0.95) 

Title  1 

-11.11 

(0.27) 

-12.65 

(0.25) 

-14.10 
(0.28) 

LEP 

-15.95 

(0.48) 

-15.97 
(0.59) 

-18.80 
(0.61) 

IEP 

-16.61 

(0.21) 

-19.77 
(0.20) 

-22.19 
(0.22) 

R-squared 

0.39 

0.41 

0.42 

Observations 

64,791 

67,168 

69,590 

Schools 

1,002 

1,009 

998 

note:  regressions  also  include  student  status  interaction  terms 
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There  are  fewer  observations  in  Table  4  than  in  Table  2's  row  for  "All  test- takers"  because  the  regressions  only 
include  students  at  schools  that  can  be  identified  in  the  MCAS/ITBS  linked  sample.  There  are  more 
observations  in  Table  4  than  in  Table  2's  row  for  the  linked  sample,  because  not  all  student  scores  were  linked 
in  any  given  school. 
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Between  1998  and  1999,  the  mean  ITBS  score  rose  by  2.93  points,  or  about  6.6  percentiles.  The 
cohort  effect  was  actually  negative  over  this  interval,  by  0.8  points,  most  of  which  can  be 
accounted  for  by  changes  in  the  percentage  of  Title  1  and  LEP  students.  Thus  1999  ITBS  scores, 
like  those  on  the  MCAS  the  same  year,  improved  despite  a  negative  cohort  effect.  The  cohort- 
neutral  rise  in  mean  ITBS  is  estimated  at  3.71  points,  or  more  than  8  percentiles. 

Was  there  a  "Second-Year  Effect"  on  ITBS  and  MCAS? 

To  examine  the  possible  second  year  effect  of  MCAS  testing,  we  turn  first  to  the  estimated 
coefficients  on  school-specific  constant  terms  in  the  MCAS  regression  -  equation  (1).  The 
school  effects  represent  whether  the  students  in  a  school,  given  their  3rd-grade  ITBS  scores  and 
demographic  characteristics,  scored  well  on  the  MCAS.  They  are  similar  to  the  "school  gain 
index"  that  the  state  of  South  Carolina  began  using  in  the  early  1990's  to  evaluate  its  schools.  1 

Much  of  the  improvement  in  MCAS  scores  between  1998  and  1999  was  across  the  board,  but 
there  was  also  cross-school  variation.  One  pattern  in  the  school  effects  is  that  schools  that  began 
with  low  school  effects  tended  to  improve  more  between  1998  and  1999.  Consider  a  regression 
of  the  change  in  school  effects  on  the  school  effect  in  1998  (standard  errors  in  parentheses):28 

(4)  (change  in  school  effect)}  =  a    +    (3(1998  school  effect^  +  £j 

=  83.8  -  0.52  (1998  school  effect^ 
(3.56)     (0.02) 

It  is  possible  that  some  of  the  negative  correlation  truly  represents  greater  systematic 
improvement  in  lesser-achieving  schools.  However,  such  an  inference  should  be  treated  with 
caution,  since  at  least  some  of  the  negative  correlation  with  the  initial  school  effects  represents  a 
"regression  to  the  mean,"  especially  since  the  school  effects  contain  measurement  error.  That  is, 
school  effects  that  were  unusually  high  in  1998  due  to  random  fluctuation  or  measurement  error 
were  unlikely  to  be  as  high  in  1999,  so  the  change  from  1998  to  1999  would  be  expected  to  be 
lower  than  average  in  those  schools. 

In  any  case,  it  will  be  useful  for  purposes  below  to  recast  (4)  with  a  fully  equivalent  regression  of 
the  levels  of  the  1999  school  effects  on  the  1998  effects. 

(5)  (1999  MCAS  school  effect)*         =    a     +   y(  1998  MCAS  school  effect);  +  £j 

=  83.8  +  0.48(1998  school  effect)) 
(3.56)      (0.02) 


27 


28 


The  South  Carolina  gain  index  differed  in  several  ways.  First,  it  used  prior  tests  in  both  math  and  reading  and 
included  interactions  and  squared  terms  for  the  test  scores.  Second,  it  does  not  include  demographic  controls 
for  the  students  (this  description  is  from  Charles  T.  Clotfelter  and  Helen  F.  Ladd,  "Recognizing  and  Rewarding 
Success  in  Public  Schools,"  in  Holding  Schools  Accountable,  edited  by  Helen  F.  Ladd,  The  Brookings 
Institution,  Washington,  D.C.,  1996). 

The  regression  in  equation  (4)  is  weighted  by  the  number  of  test-takers  in  the  school,  as  are  the  remaining 
regressions  in  this  Appendix. 
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This  way  of  presenting  the  estimated  coefficients  stresses  the  point  that  the  previous  year's 
school  effect  is  the  best  predictor  of  the  current  school  effect.  Recall  that  these  school  effects 
have  already  controlled  for  the  students'  prior  year's  ITBS  scores  so  that  they  measure  the 
school's  contribution  to  student  improvement  during  4th-grade.  The  persistence  is  not  surprising 
-  schools  that  did  well  in  1998  also  tended  to  do  well  in  1999. 

Ideally,  we  would  like  to  determine  what  factors  might  explain  the  school  effect,  over  and  above 
the  previous  year's  school  effect.  There  appears  to  be  a  connection  between  the  1999  MCAS 
school  effect  and  a  school's  performance  on  the  ITBS.  To  flesh  out  this  relationship,  we  turn  to 
the  ITBS  school  effects.  The  ITBS  school  effects  are  the  estimated  coefficients  from  the  school 
specific  constants  in  (3).  They  measure  the  average  score  for  students  in  a  school,  after  adjusting 
for  the  demographic  characteristics  of  individual  students.  Specifically,  we  look  at  whether 
schools  that  contributed  more  than  expected  to  students'  ITBS  scores  also  tended  to  do  so  for 
students'  MCAS  scores. 

We  began  with  a  regression  of  ITBS  school  effects  analogous  to  (5): 

(6)  (1999  ITBS  school  effect)  s  =    a     +   y(1998  ITBS  school  effect^  +  8  j 

=  55.5  +  0.74  (1998  ITBS  school  effect) , 
(4.8)     (0.02) 

We  then  estimated  the  residual  (call  it  Y2,  for  "year-2"  effect): 

(7)  Y2  j  =  (1999  ITBS  school  effect)  j  -  [55.5  +  0.74  (1998  ITBS  school  effect)  j] 

Then,  we  added  Y2  to  the  prior  year's  MCAS  school  effect  as  an  explanatory  variable  in  (5). 

(8)  (1999  MCAS  school  effect);         =    a     +   y(1998  MCAS  school  effect),  +  pY2,  +  Ej 

The  first  column  of  Table  5  below  contains  the  results  of  (5);  the  second  column  contains  the 
results  when  Y2  is  included,  i.e.  equation  (8).  The  coefficient  on  Y2  is  positive  and  highly 
significant.  This  indicates  that  schools  that  did  better  than  expected  in  generating  high  ITBS 
scores  in  1999,  given  the  demographics  of  their  students,  were  also  more  effective  on  the  MCAS 
in  1999  than  would  have  been  predicted  from  their  1998  MCAS  school  effects. 

Another  way  to  express  this  relationship  is  to  compare  the  residuals  from  (5)  and  (6).  A  positive 
residual  from  (5)  signifies  that  a  school  performed  better  than  expected  on  the  1999  MCAS, 
given  its  performance  in  1998.  Y2,  the  residual  from  (6),  is  the  analogous  measure  for  a  school's 
performance  on  the  1999  ITBS  after  controlling  for  its  performance  in  1998.  The  residuals  are 
correlated  -  schools  with  better  than  expected  results  on  the  ITBS  also  had  them  on  the  MCAS. 
These  residuals  form  the  basis  for  Chart  5  in  the  body  of  the  paper.  Of  the  schools  with  positive 
MCAS  residuals,  54%  also  had  positive  ITBS  residuals,  while  only  44.7%  of  schools  that  scored 
worse  than  expected  on  the  MCAS  scored  better  than  expected  on  the  ITBS. 


In  the  school  effect  regressions  we  only  include  the  schools  for  which  we  have  estimated  MCAS  school  effects. 
The  regressions  are  weighted  by  the  number  of  students  taking  the  1999  ITBS  exam. 


Technical  Appendix         17 


Table  5:  1999  MCAS  School  Effect  Regressions 


1 

2 

Equation  Estimated 

(5) 

(8) 

Variable 

Coefficient 
(std.  error) 

Coefficient 
(std.  error) 

1998  MCAS  School 
Effect 

0.48 
(0.02) 

0.46 
(0.02) 

Y2 

0.069 
(0.013) 

Y2  Adjusted 

Constant 

83.8 
(3.6) 

86.7 
(3.6) 

R-squared 

0.32 

0.34 

Observations 

982 

982 
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